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ABSTRACT 

We present a method to simulate deep sky images, including realistic galaxy morpholo- 
gies and telescope characteristics. To achieve a wide diversity of simulated galaxy mor- 
phologies, we first use the shapelets formalism to parametrize the shapes of all objects 
in the Hubble Deep Fields. We measure this distribution of real galaxy morphologies 
in shapelet parameter space, then resample it to generate a new population of ob- 
jects. These simulated galaxies can contain spiral arms, bars, discs, arbitrary radial 
profiles and even dust lanes or knots. To create a final image, we also model observa- 
tional effects, including noise, pixellisation, astrometric distortions and a Point-Spread 
Function. We demonstrate that they are realistic by showing that simulated and real 
data have consistent distributions of morphology diagnostics: including galaxy size, 
ellipticity, concentration and asymmetry statistics. Sample images are made available 
on the world wide web. These simulations are useful to develop and calibrate precision 
image analysis techniques for photometry, astrometry, and shape measurement. They 
can also be used to assess the sensitivity of future telescopes and surveys for applica- 
tions such as supernova searches, microlensing, proper motions, and weak gravitational 
lensing. 

Key words: galaxies: fundamental parameters, statistics - methods: statistical - 
gravitational lensing. 



1 INTRODUCTION 

As astronomical surveys are growing in size and scope, so im- 
age analysis methods are increasing in complexity and accu- 
racy. In order to calibrate these new methods, it is essential 
to have a large sample of images containing objects whose 
properties are already known. Since real data is subject to 
the uncertainties of observational noise, telescope aberration 
and seeing, several packages have been developed to man- 
ufacture artificial images (e.g. Skymaker (see Erben et al. 
2001) or artdata in IRAF (Tody 1993)). The accuracy of 
image analysis methods can then be assessed by comparing 
their output to the known input image properties that were 
specified before the addition of such observational effects. 

The image simulation packages currently available are 
particularly valuable for imitating deep ground-based data. 
However, they limit themselves to a representation of galax- 
ies as parametric forms such as symmetric de Vaucouleurs 
or exponential profiles. Deep space-based images, on the 
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other hand, contain many irregular or asymmetrical galaxies 
with complex resolved features such as spiral arms, mergers 
and dust lanes. One possibility for simulating space images, 
utilised by Bouwens, Broadhurst & Silk (1998), is to re- 
peatedly reuse well-resolved galaxies from the Hubble Deep 
Fields (HDFs; Williams et al. 1996, 1998). However, this re- 
stricts us to morphology templates from a relatively bright 
and nearby sample. Fainter galaxies cannot be used be- 
cause they have been significantly contaminated with back- 
ground noise. Consequently, the morphological properties of 
the faint galaxy population are not fairly represented. This 
method also faces the difficulty that the same real galax- 
ies must be reused many times within one simulation. Al- 
though the HDFs are indeed very deep (If814w=27.60 at 
10<t, Williams et al. 1996), they only cover a small area 
(~6 square arcminutes each) and contain a finite number 
of galaxies. Even if we were to source our real galaxies from 
larger surveys such as the Groth strip (Groth et al. 1994) or 
the Medium Deep Survey (Ratnatunga, Griffiths & Ostran- 
der 1999), we would still face the difficulty of using particular 
real galaxies many times in a large simulation. 
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In this paper, we present a method for simulating deep 
images that contain genuinely unique objects, yet replicate 
the morphological distribution of galaxies in the HDF at all 
depths. This method has the advantage of allowing us to 
simulate arbitrarily large, deep surveys with no repetition 
of galaxy shapes. It also allows us to know accurately the 
intrinsic properties of each galaxy, before adding telescope- 
specific noise properties, systematic effects and convolution 
with a Point-Spread Function (PSF). 

Our method is to decompose all objects in the HDFs 
into shapelet parametrizations, following the formalism in- 
troduced by Refregier (2003, hereafter Shapelets I) and Re- 
fregier & Bacon (2003; hereafter Shapelets II). Using just 
a few coefficients, these can completely quantify the shape 
properties of all galaxies, including spiral arms, bars and 
arbitrary radial profiles. We then model their distribution 
of shapelet coefficients, and draw from this probability dis- 
tribution new sets of shapelet coefficients, representing new 
galaxies. In particular, we take into account the covariancc 
between shapelet coefficients so that, for example, shapes 
depend upon magnitude and size (e.g. faint galaxies appear 
more irregular than bright ones). In this method, we there- 
fore do not input any model of physical morphology or evo- 
lution. Rather, we exclusively use the measured statistics of 
the shapelet coefficient distributions from a real galaxy sam- 
ple, as a function of magnitude and size. The new galaxy 
images can then be analytically convolved with any PSF, 
pixellated, and given an appropriate amount of noise for 
any exposure time down to the depth of the HDF. 

These simulations have several significant applications. 
We can use them to calibrate the effectiveness of image anal- 
ysis and detection methods such as SExtractor (Bertin & 
Arnouts 1996), imcat (Kaiser, Squire & Broadhurst 1995), 
GIM2D (Simard 1998), GALFIT (Peng et al. 2002) and wavelet 
routines (e.g. Meyer 1993). By examining the errors on shape 
measurement at various signal-to-noise levels of galaxy de- 
tection, we can also predict the accuracy of future experi- 
ments requiring accurate shape measurement. An example 
of this for space-based cosmic shear surveys is presented in 
Massey et al. (2003). 

This paper is organized as follows. In §2 we give a brief 
overview of the shapelet formalism and describe how the 
HDF galaxies are modelled using shapelet coefficients. In §3, 
we show how the properties of the shapelet basis functions 
make them eminently suitable for this method. In §4 we 
discuss the means by which we recover a smooth probability 
distribution of galaxy morphologies in shapelet parameter 
space. In §5 we generate new galaxies by resampling the 
distribution. We then add observational noise and show an 
example of the final simulated images. 

We then demonstrate that the simulations do indeed 
have similar properties to the HDFs. For this purpose, 
we consider in §6 commonly used quantifiers for galaxy 
morphology. We find good agreement between simulations 
and the real HDF galaxies for measures such as the size- 
magnitude distribution, ellipticity, concentration, asymme- 
try and dumpiness indices (e.g. Bershady et al. 2000, Con- 
selice et al. 2000a). It is this agreement which is the fi- 
nal justification of our shapelet-based simulation method. 
We compare our method to others in §7 and summarise 
our findings in §8. Sample images may be downloaded from 
http : //www . ast . cam . ac . uk/ ~r jm/ shapelets. 



2 SHAPELET SOURCE CATALOGUE 

In this section, we describe the detection of HDF galaxies 
and their modelling as shapelets. This procedure creates a 
parametrized catalogue of real galaxy morphologies, which 
we will require later. 



2.1 Source detection 

Objects are initially detected using SExtractor (Bertin & 
Arnouts 1996) upon the HDF /-band (F814W) images, to- 
gether with the pixel weight maps outputted by DRIZZLE 
(Fruchter & Hook 2002). The convolution mask and detec- 
tion parameters were adapted from those used by Williams 
et al. (1996). In particular, we use a comparatively low S/N 
detection threshold, DETECT.THRESH, of 1.3. This af- 
fords recovery of faint galaxies and minimizes incomplete- 
ness, at the expense of many false-positive 'detections' of 
noise, which need to be flagged and filtered out later (see 
§2.2). Stars with CLASS_STAR > 97% are immediately dis- 
carded, as we wish to model only galaxies. The image is then 
segmented into small square 'postage stamp' regions around 
the remaining galaxies. The sizes of these regions are set to 
(3 x AJMAGE+5) pixels square, where AJMAGE is a mea- 
sure of the galaxy's major axis provided by SExtractor. This 
area is slightly smaller than those illustrated in figure 1; it 
is compact enough to be computationally efficient, but large 
enough to ensure that the shapelet basis functions are close 
to zero at the boundaries of the image. 

This prescription conveniently leaves a border of sky 
background and noise around the edge of each image. We 
use all of the border pixels that do not belong to any other 
object in the SExtractor catalogue to locally renormalise 
the pixel weight map. As noticed by Williams et al. (1996), 
the inverse variance map outputted during the data reduc- 
tion of the HDF systematically overestimates the noise by 
a factor of a few. This bias also varies as a function of po- 
sition around the image. While SExtractor requires only 
relative weights between pixels, and is thus unaffected by 
this bias, we need to calibrate the absolute value of noise for 
the shapelet decomposition. 



2.2 Shapelet modelling 

Shapelets are a complete, orthonormal set of 2D basis func- 
tions. A linear combination of these functions can be used to 
model any image, in a similar way to Fourier or wavelet syn- 
thesis. The shapelet decomposition is particularly efficient 
for images localised in space, such as those of individual 
galaxies. The formalism was first introduced in Shapelets I, 
and a related method has also been independently suggested 
by Bernstein & Jarvis (2002). 

For the polar shapelet analysis, the surface brightness 
/(x) of an object can be written as 

oo n 

/( X ) = X! X! a »™Xnm(x-x c ;/3) , (1) 

n— m— — n 

where /3 is a scale parameter, and x c is the position of the 
centre of the basis functions. Only combinations of n and m 
where both are even or both are odd should be included in 
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Figure 1. Shapclct modelling of a selection of HDF /-band galax- 
ies. Higher S/N galaxies typically require more shapelet coef- 
ficients so we display a variety of source galaxies, noting the 
shapelet required to reach a reconstruction with Xr = 1- 

In all cases, the first column shows the original HDF image; the 
middle column shows the shapelet model; the right column shows 
the residual. The image size and colour scale is different for each 
row. 
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where L(x) are the Laguerre polynomials (see e.g. Boas 
1983). The index n describes the radial oscillations and the 
index m describes the order of rotational symmetry. Or- 
thonormality ensures that the shapelet coefficients a nm are 
given by 
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These are Gaussian-weighted multipole moments of the sur- 
face brightness, familiar in several branches of astronomy. 

For reasonable choices of the centroid x c and scale size 
P, the galaxy shape information is contained within only 
the first few shapelet coefficients. The series in equation (1) 
can then be truncated at some finite order n max . In order to 
make good choices for x c , f3 and ^max) We first define xl as 
the difference between the original and reconstructed image, 
renormalised with respect to the local noise level. We then 
attempt to find the values of x c and /3 which achieve Xr = 1 
with the fewest possible shapelet coefficients, or minimum 
n m ax- Shapelet coefficients with higher n can be discarded 
and the shapelet model will still be consistent with the data. 

A practical algorithm to perform this optimisation by 
iteratively exploring {x c , /3, n max } space is described in 
Massey & Refregicr (2003). The algorithm creates a cata- 
logue of optimised shapelet decompositions for ~ 500 ob- 
jects per square arcminute in the HDFs. However, this rep- 
resents only 81% of the 'objects' detected by SExtractor. 
Approximately two dozen of the brightest of these galax- 
ies require a decomposition with n max > 15 to achieve 
Xr < 1. To reduce the dimensionality in later analysis, these 
paramctrizations are truncated at this point regardless, with 
x c and p chosen to give the best possible, if slightly imper- 
fect, shapelet fit. The algorithm also fails to converge to 
fits with Xr < 1 for a further 42 galaxies in close pairs, as 
identified by the SExtractor segmentation map; 36 galaxies 
because of their proximity to bright stars or the edge of the 
image; and 60 more objects across the HDFs (about 10% 
of all SExtractor detections), which are mainly false detec- 
tions of noise due to the low S/N detection threshold set in 
section §2.1. Note that the number of decompositions which 
fail due to contamination from a near neighbour is roughly 
independent of magnitude. Indeed, the slope of the num- 
ber counts for galaxy pair members is within la of that for 
all the galaxies in the HDF: therefore this particular effect 
should not introduce any bias. 

Figure 1 displays a selection of HDF galaxies at vari- 
ous S/N levels, and their shapelet reconstructions (see also 
Shapelets I, figures 3 and 4). Faint galaxies typically require 
an rimax of only 2, 3 or 4, while brighter, larger objects re- 
quire an increasing number of shapelet coefficients to model 
their greater degree of detail. The right-hand column of fig- 
ure 1 shows the reconstruction residuals, which are consis- 
tent with noise even for irregular galaxy morphologies. 
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2.3 Treatment of the PSF 

During the modelling of galaxy shapes, we must in general 
account for the PSF of the WFPC2 camera that has smeared 
the HDF images. Since our objective here is to simulate only 
HST images, we do not apply any correction. The PSF will 
be naturally contained within the shapelet parametrization 
of the galaxy images and these are intentionally left unal- 
tered. When we create simulated images, they will automati- 
cally have been smeared by the WFPC2 PSF: effectively cir- 
cularised on average, because of the random reorientation of 
the new galaxies. 

However, for other applications it may be desir- 
able to simulate observations from other tele- 
scopes such as the JWST (http://www.stsci.edu/ngst/), 
SNAP (http://snap.lbl.gov/) or GAIA (http: //astro, 
esa. int/gaia/). It would then be necessary to take ac- 
count of their different instrumental properties. The ideal 
way to do this would be to deconvolve HDF galaxies from the 
WFPC2 PSF analytically in shapelet space (see Shapelets II 
§3), and then to reconvolve simulated galaxies with a new 
PSF at the end. Unfortunately, we have found this method 
difficult to implement in practice. The process of decon- 
volution naturally pushes information into high-n and m 
shapelet coefficients, as shown in Shapelets I figure 8. Al- 
though the ensuing galaxy reconstructions are still realis- 
tic, information about the overall galaxy morphology distri- 
bution is spread thinly over an increased number of coeffi- 
cients. This distribution is no longer sufficiently well sampled 
by galaxies in the HDFs for the smoothing-and-resampling 
method presented in §4 to be effective. 

An alternative solution exists to simulate images with 
a PSF of the same size or larger than that of HST. The 
WFPC2 PSF can be conveniently maintained throughout 
the simulations, and the images convolved again at the end 
with a second, 'difference' kernel. This kernel is intended to 
make up the difference between the original PSF of WFPC2 
and that of the new instrument. It can be obtained by decon- 
volving the WFPC2 PSF from the new PSF, an operation 
performed easily in shapelet space (see Shapelets II §3). An 
example of this method can be seen in Massey et al. 2003. 



3 ADVANTAGES AND DISADVANTAGES OF 
USING SHAPELETS 

3.1 Advantages of shapelets 

Figure 1 demonstrates the superb quality of shapelet-based 
image reconstruction possible for all galaxy morphologies. 
Particularly for spiral or irregular galaxies, we find the 
shapelet models superior to those using traditional radial 
profiles alone e.g. GALFIT (Peng et al. 2002). That paper 
contains plots similar to figure 1; but with much worse resid- 
uals. 

There are also many more advantages to using the 
shapelet parametrization for image simulations. For exam- 
ple, the truncation in n max produces data compression by 
setting a minimum and maximum physical scale of interest 
(see discussion in Shapelets I). The discarded high-n order 
coefficients contain a small amount of high spatial frequency 
information. But because we have ensured that the recon- 
struction has been pursued up to an order n max such that 



Xr < 1, we know that the high-frequency remainder is con- 
sistent with noise. Usefully for astronomy, the resolution of 
a shapelet model is also greatest near its centre. The com- 
pression factor for typical galaxy morphologies can be as 
high as 50 (Shapelets I). Furthermore, this compression is 
achieved through a parameter-independent truncation of a 
series. With its complete basis set, shapelets avoid the re- 
quirement in GALFIT or GIM2D (Simard 1998) to specify in 
advance the number and type of profiles for each model. 
A Karhunen-Loeve decomposition would also require mod- 
els to be specified in advance for both the image and the 
noise. Furthermore, the orthonormality of the shapelet ba- 
sis set guarantees a unique and linear one-to-one mapping 
from the image plane to the coefficients. This advantage, 
and many of shapelets' convenient mathematical properties 
are lost to methods using an over-complete basis set such as 
Pixon (Pifia & Puetter 1993). 

It is mainly for these convenient mathematical proper- 
ties that we choose to model galaxies using shapelets. For 
example, an object's orientation is controlled to first order 
by the phase of the 022 coefficient (corresponding to the posi- 
tion angle of the object's ellipticity); and its chirality (hand- 
edness) by the relative phase of the a^2 coefficient. The first 
can easily be factored out of the parametrization, so that 
the ellipticity of all objects becomes aligned to the horizon- 
tal axis. The image can then be flipped, if necessary, so that 
the sign of the 042 phase is positive and the outer isophotes 
of all objects twist in the same ant i- clockwise sense. Cor- 
relations between remaining shapelet coefficients a nm are 
of course maintained in order to preserve the morphology of 
the galaxy. Any two well-sampled objects which are identical 
apart from their orientation will then decompose into identi- 
cal shapelet coefficients. This greatly increases the sampling 
density of the galaxy morphology distribution. Simulated 
galaxies will later be randomly rerotated and flipped as they 
are created. 

Shapelets are also designed to be convenient for many 
aspects of image manipulation and post-processing. Since 
the shapelet basis functions Xnm are the eigenfunctions of 
the 2D Quantum Harmonic Oscillator, they are invariant 
under Fourier transform up to a phase factor. This renders 
convolutions (e.g. with a PSF) easy and quick to perform. It 
also suggests a well-developed mathematical notation from 
quantum mechanics. Convolutions become a bra-ket matrix 
multiplication (see Shapelets II). Translations and rotations, 
useful for simulating dithered images, are described to first 
order by a few applications of a and ci 1 ladder operators. 
So too are the distorting shears produced by both optical 
aberrations within a telescope and weak gravitational lens- 
ing within galaxy clusters (see Shapelets I) . 

3.2 Disadvantages of shapelets 

There are two main criticisms often levelled at shapelets. 
The first is that a Gaussian-Laguerre expansion may not 
easily capture the extended wings of many galaxies. After 
truncation in n max , the shapelet basis set is left incomplete 
and not ideally matched to typical exponential or de Vau- 
couleurs profiles. A demonstration that our algorithm does 
select sufficiently high n max is the remarkable match in the 
concentration index between shapelet models and real galax- 
ies shown in §6. In fact, the ability of a shapelet decompo- 
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sition to recognise correlations between adjacent pixels may 
even enable it to extend further than SExtractor into the 
wings of a faint object at the threshold of detection, where 
flux in individual pixels is lost beneath the noise. 

A potentially greater problem for our simulations is the 
second criticism that a shapelet decomposition can produce 
artefacts when it is truncated. Indeed, any truncated basis 
set that is complete rather than over-complete will be sub- 
ject to spurious residuals that resemble one basis state, due 
to the near-cancelling of large positive and negative coeffi- 
cients in others. For shapelets, this emerges as ringing, and 
is particularly apparent after PSF deconvolution or around 
long and thin galaxies, which are less well-matched to the 
circular basis functions. Furthermore, a desire to keep the 
shapelet decomposition method linear prevents the impo- 
sition of a positive-definite constraint. The spurious resid- 
uals can therefore appear as either extra positive flux or 
negative holes. However, we note that this occurs widely in 
other methods, including wavelets, where it is only removed 
by a (non-linear) projection in wavelet space onto the sub- 
space of positive solutions. While most low-level residuals 
will be lost in the final simulated images beneath even mod- 
est background noise, we turn around this disadvantage in 
§5.1. There we use the absence of any negative holes in a 
noise-free image as a first-order diagnostic that the morphed 
shapes of simulated galaxies are realistic. 



4 SHAPELET PARAMETER SPACE 

4.1 The multi-dimensional Hubble Tuning Fork 

A sample of galaxy morphologies can be thought of as a dis- 
tribution of points in a multi-dimensional shape parameter 
space. The axes in this space might represent size, mag- 
nitude, position angle (P.A.) and so on. Each point corre- 
sponds to a particular galaxy with a specific morphology, 
and various correlations may emerge between variables. For 
example, the classic Hubble tuning-fork diagram (Hubble 
1926, Sandage 1961, de Vaucouleurs 1959) relates the object 
ellipticity, the bulge/disc ratio, and the extent to which the 
spiral arms are unwound. GIM2D (Simard 1998) and GALFIT 
(Peng 2002) software use axes parametrized by the rela- 
tive amounts of exponential or de Vaucouleurs/ Sersic func- 
tions (de Vaucouleurs 1959, Sersic 1968) required to fit the 
galaxy's radial profile. 



4.2 The multi-dimensional Shapelet Tuning Fork 

In this work, we instead choose the axes of our galaxy 
morphology distribution to be the magnitudes and complex 
phases of the polar shapelet coefficients. First, we describe 
the properties of this 'shapelet parameter space'. In the fol- 
lowing section, we will then argue that the underlying Prob- 
ability Density Function (PDF) of galaxy morphology is rel- 
atively simple in this parameter space and may be recovered 
from a finite sample like the HDFs. 

Projections of shapelet parameter space are shown in 
figure 2. Each point in the top-left panel represents a data 
vector encoding the shape information about one HDF 
galaxy. Collectively, they describe the overall morphology 
distribution of distant galaxies. The rotations and reflections 
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Figure 2. Phase space correlations and smoothing in the shapelet 
parameter space. The top left panel displays the position of mea- 
sured HDF galaxies along two axes of shapelet space; the top right 
panel shows the probability distribution produced by smoothing 
this distribution. The other left panels display further projec- 
tions of the PDF onto shapelet coefficient, size and magnitude 
axes, while the remaining right panels display phase correlations 
between shapelet coefficients. The colour scale is logarithmic in 
the bottom left panel. 



used to pre-align galaxies have considerably compressed this 
space (without loss of information) and allowed it to be more 
densely sampled by only a finite number of galaxies. No- 
tice that there are correlations evident in the parameters, 
which correspond to the construction of the familiar shapes 
of galaxies. In the middle-left plot, for example, the scat- 
ter of ellipticity values widens for faint galaxies which are 
known to be more irregular. In the bottom-right plot, devi- 
ations from the diagonal show twisting isophotes that can 
grow, with higher order basis functions, into spiral arms. It 
is also important to notice that some regions of parameter 
space are empty. A random set of shapelet coefficients will 
not produce a realistic galaxy shape: there is not even a 
positive definite constraint imposed upon an image in the 
shapelet formalism. 

Two other axes are required for our parameter space, 
since real galaxy morphologies clearly vary as a function of 
size and magnitude (e.g. figure 5). Storing the shapelet scale 
factor [3 (see §2.2) allows large HDF galaxies to occupy dif- 
ferent regions of parameter space to small ones. Similarly, 
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using magnitude as a parameter allows galaxies of different 
luminosities to have different shapes. Since shapelet coef- 
ficients (including aoo) scale as the flux, once we include 
magnitude as an independent parameter, we can divide all 
a.nm by aoo- This removes explicit magnitude dependence 
from these quantities and coincidentally ensures a conve- 
nient version of adaptive smoothing at a later stage (see 
§4.4). The degenerate parameter aoo = 1 is now removed, 
and size and magnitude are treated in the same way as any 
other axis of the parameter space from now on. 

Note that any orthogonal transformation of the shapelet 
basis functions would maintain their useful properties of 
completeness, orthogonality and Fourier transform invari- 
ance. For instance, the Cartesian version of shapelets can be 
used instead (see Shapelets I), but without the convenient 
factoring out of the object's orientation and handedness. 
Using principal components analysis (PC A; e.g. Francis & 
Wills 1999), it is possible to calculate the optimal linear com- 
bination of shapelet coefficients to quantitatively describe 
galaxy morphology with the fewest numbers. However, both 
elliptical and spiral galaxy shapes are already quite simple 
to manufacture by specifying only a few polar shapelet co- 
efficients; we therefore avoid the extra complication of PCA 
in this paper. Of course, the principle components of galaxy 
morphology are interesting in their own right. These are be- 
ing studied elsewhere. 

4.3 Recovery of a smooth underlying PDF 

The top-left panel of figure 2 shows a slice through the 
parameter space of galaxy morphologies, populated by 5- 
functions representing real, observed shapes. Unlike a dis- 
tribution parametrized simply by bulge/disc ratios and disc 
inclination angles, it is not obvious a priori that an under- 
lying, smooth PDF should exist for galaxy morphologies in 
shapelet space. However, the compact shapelet representa- 
tion of astronomical objects suggests that this ought to be 
the case, and we will attempt to recover it by smoothing this 
parameter space. 

Once the validity of the smoothed PDF has been estab- 
lished, it will be a simple matter to resample it and thus 
to synthesise a population of galaxies. Monte Carlo tech- 
niques can be used to generate unlimited numbers of realis- 
tic galaxies in this fashion, to fill any amount of sky area in 
a simulated imaging survey. 

The remaining panels of figure 2 demonstrate that the 
parameter space is indeed smooth in those places where it is 
well sampled. We assume that some other regions are equally 
smooth, but poorly sampled because of the finite number of 
galaxies in the HDF. We note that voids are also expected 
in the parameter space, where the shapelet expansions do 
not correspond to realistic galaxy shapes. We will therefore 
be careful not to smooth the PDF with large smoothing 
lengths which would significantly encroach upon these voids. 
However, limited perturbations around HDF galaxies may 
indeed recover realistic morphologies. 

Without an explicitly physical model of galaxy mor- 
phology and evolution built in to shapelets, it is the final 
results that must provide the ultimate verification of our 
statistical method. In §5, we show that it is indeed possible 
to find a smoothing length for the PDF that recovers objects 
which appear to represent realistic shapes. In §6 we demon- 



strate quantitatively that their global properties are realis- 
tic, by comparing real and simulated populations of galaxies 
via morphology diagnostics commonly used on deep images. 



4.4 Multivariate kernel smoothing method 

Many practical approaches have been devised to smooth dis- 
crete samplings of a multivariate PDF. Our main constraint 
in selecting one of these methods is the very high dimension- 
ality of our data set. The median n max required for objects 
in the HDF is 4. However, even with the efficient data com- 
pression that shapelets can afford, models of the highest S /N 
galaxies use values for n max as high as 15. Adding object size 
and magnitude, this corresponds to 137 total coefficients, 
and this is therefore the maximum number of dimensions 
required. 

To smooth and resample this dataset, we have chosen 
the Kernel smoothing method which is eloquently reviewed 
by Silverman (1986). Kernel smoothing can be considered as 
an alternative to using histograms. It avoids the ambiguity 
of binning and instead yields a smooth analytic curve. For 
1-dimensional data, each sample data point is replaced by 
a smooth Gaussian kernel. To create a PDF, all the kernels 
can be summed and then normalised to integrate to unity. 
The width of these smoothing Gaussians still remains to 
be decided, but methods exist for optimising this factor. 
Each kernel can even be given a different width, calculated 
as a function of a quick local density estimate, in order to 
produce adaptive smoothing. 

In data with more than one dimension, each sample 
point is replaced by a multivariate kernel. To help over- 
come the difficulties associated with the leaking of proba- 
bility density into the wings of many-dimensional kernels, 
we replace the Gaussian with a more compact Epanechnikov 
kernel (Epanechnikov 1969), 

K{Sxi) = ! 3§-(M^) 2 ) to (4) 
[ elsewhere, 

where we have reformatted the shapelet coefficients of each 
HDF galaxy into a data vector Xi , and 5xi are deviations in 
shapelet space from these real data points. In each case, i is 
a coefficient index running from 1 to 137. A^ are smoothing 
widths which will be determined for each direction of our 
PDF space in §5.1. Isodensity contours of this kernel are 
multivariate ellipses whose axes are aligned with those of 
the co-ordinate axes (see figure 2) . In general, they could be 
allowed to point in any direction (Sain 1999) , but we do not 
find this to be necessary. 

We implement an adaptive smoothing of our PDF by 
reparametrizing a nm as 3^m._ Given a constant Aj, this cre- 
ates an effective smoothing kernel for each object of widths 
K = aooAi. This functional form is useful because the 
brighter HDF objects are less frequent, and are therefore 
more isolated in probability space. Since aoo roughly corre- 
lates to total flux, we obtain a larger smoothing radius for 
brighter objects and better recover the underlying probabil- 
ity distribution. We will prove that this recipe does produce 
realistic morphologies in §6. 
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Figure 3. The effect of perturbing galaxy morphologies in shapclet space. Each image in the top row shows a real HDF galaxy, rotated 
by random angles. Its shapelet coefficients arc incrementally perturbed in successive rows, although its overall flux is kept constant for 
the purpose of this plot. A degree of perturbation corresponding to our choice of the smoothing length Ai is shown inside the box: 
these images represent typical simulated galaxies. Perturbations larger than Ai produce objects which contain significantly negative pixel 
values. The left panel depicts a spiral galaxy; the right panel a more typical irregular form. The colour scale is logarithmic. 



5 IMAGE GENERATION 

5.1 Resampling the galaxy morphology PDF 

Having recovered a realistic and analytic PDF of galaxy 
morphologies, we now wish to resample this distribution to 
generate brand new galaxy populations. The main advan- 
tage of the kernel smoothing approach now becomes ap- 
parent. Without resorting to costly numerical integration, 
Silverman (1986) §6.4.1 presents a quick bootstrap method 
to generate a Monte-Carlo sample from a PDF constructed 
with S- functions smoothed by kernels K(8x). We take the 
following steps to simultaneously smooth and resample the 
parameter space of HDF galaxies: 

Step 1 : Randomly select one of the original HDF 
galaxies, uniformly and with replacement. 

Step 2 : Generate a small perturbation Sxi from 

the probability density function K(Sxi). > (5) 

Step 3 : Add Sxi to the shapelet coefficients Xi of 
the HDF galaxy. This simulates a new 
galaxy, sampled from the overall PDF. 



This approach is arrived at by simply regarding the PDF 
as a sum of small kernels rather than one overall function. 
Individually, these kernels are quick to compute; and the 
dimensionality of the PDF can even be lowered for faint ob- 
jects that require fewer coefficients. The perturbations can 
be quickly sampled from an Epanechnikov kernel K(5x) by 
generating three random numbers from a uniform probabil- 
ity distribution between — Ai and Ai. If the first does not 
have the highest absolute value, take it and discard the rest; 
otherwise take the second. Iterating this procedure to gen- 
erate sufficient objects for a simulated Hubble Deep Field 
requires only a few minutes on a 1GHz PC. 

We must now decide how to choose the overall smooth- 
ing length Ai. If Ai = 0, the kernel is a S- function and the 
original HDF objects are recovered exactly. This arrange- 
ment will create simulations of limited practical use, but in 
§6 they act as an intermediate test of the shapelet decom- 
position. As Ai — > co, the coefficients for simulated galaxies 
become completely random and the objects become unreal- 
istic. In this limit, since no positive-definite constraint is ever 
imposed in the shapelets formalism, we find that simulated 
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Figure 4. Sample HDF-dcpth simulated /-band image, 30"x30". 



objects exhibit undesirable holes of negative flux. Figure 3 
shows realisations of how a typical galaxy from the HDF 
is altered by increasingly large perturbations to its shapelet 
coefficients, showing negative flux for large Xi perturbations. 

We therefore require a choice of which is sufficiently 
large to produce new galaxies, yet sufficiently small to main- 
tain realistic morphological properties. By measuring the 
minimum pixel values of many different galaxy realisations, 
we find suitable results if \ p hase 15° and X mo duli 
4x [mean separation between nearest neighbours in that di- 
mension]; beyond these values, negative holes rapidly ap- 
pear. For the purposes of this paper, we therefore fix Xi to 
these limiting values. This still represents relatively weak 
smoothing, but the variety and realism of generated mor- 
phologies is pleasantly surprising: polar shapelets are in- 
deed sufficiently close to the Principal Components of galaxy 
morphology that small perturbations in shapelet space cor- 
respond to reasonable and realistic changes within galaxy 
types. A quantitative demonstration of these remarkable re- 
sults is presented in §6. 

5.2 Scattering galaxies on the sky 

A Monte Carlo population of genuinely new yet realistic ob- 
jects has been extracted from the PDF of galaxy morphol- 
ogy. These galaxies are now allocated random orientations 
and locations on the sky, at a density of 700 per arcmin 2 . 
This constant has been calibrated to recover the same total 
number counts, after the addition of noise, as are measured 
in the HDFs (see §6.1). No attempt is made here to correctly 
model the 2-point correlation function of galaxy positions, 
or to include galaxy mergers beyond those sufficiently ad- 
vanced to appear as one object in the input SExtractor 
catalogue. 

The correct slope in the size and magnitude distribu- 
tions are automatically ensured over a wide range of va- 




Figure 5. Section of the real HDF, with the same size and scale 
as figure 4. 



lidity, since size and magnitude are intrinsic variables of 
the PDF (see the bottom-left panel of figure 2). However, 
it is important to consider the question of completeness in 
our simulations for very faint galaxies. A discrepancy could 
arise through either non-detections of faint HDF galaxies 
by SExtractor or non-convergence of their shapelet decom- 
positions. The first effect is minimised by our choice of 
SExtractor parameters (see §2.1) and the second is shown 
in §2.2 to be under control. However, the number counts of 
galaxies at the very faint end (/ Si 29) are also highly sen- 
sitive to the the precise background noise properties (see 
§5.3). For this reason, we choose not to consider galaxies 
fainter than I = 29. 

At the bright end, we also expect the simulations to 
be incomplete, since the HDFs were intentionally chosen by 
STScI as areas containing few large, bright galaxies. In the 
future, we will extend our simulations in this respect by 
incorporating 'Groth survey strip' (Groth et al. 1994) and 
ACS galaxies into the object source catalogue. One could 
also compensate for any known incompleteness by preferen- 
tially selecting for under-represented galaxy types in step 1 
of procedure (5). 

5.3 Modelling telescope and observational effects 

The shapelet models of galaxy images are actually analytic 
functions. These can quickly be convolved with a PSF that 
has also been decomposed into shapelets, using the matrix 
operation in Shapelets II §3.1. Stars can also be included 
in an image, given a magnitude distribution, by repeat- 
edly placing the shapelet model of the PSF in an image 
at the appropriate flux amplitude. All of these analytic ob- 
jects are then integrated within square pixels of the same 
0.0398"resolution as the DRIZZLEd Hubble Deep Field. Our 
images have a somewhat larger solid angle than the HDFs 
because the missing quarter from the WFPC 'L' is restored. 
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Observational noise can now be added, at a level appro- 
priate to the desired exposure time. We have simply added 
photon counting noise (proportional to the square root of 
the raw pixel values), and Gaussian background noise (with 
an amplitude determined from the HDF itself). However, it 
would be easy to add a background level, cosmic rays and 
even instrumental distortions: the shearing for which could 
be performed conveniently in shapelet space before pixelli- 
sation. A further effect, not included in our simple model, 
is noise that is correlated between adjacent pixels. Alias- 
ing occurs as a side-effect of the DRIZZLE algorithm, which 
recovers image resolution by stacking several dithered expo- 
sures. This aliasing can make it possible to detect slightly 
fainter objects and also introduces some spurious objects at 
very low S/N. The steep slope of the real number counts 
beyond 1 — 29 exacerbates this problem, and we would not 
yet trust the noise model on our simulations for galaxies any 
fainter than this. 

Final output is as a FITS image, a sample of which 
is displayed in figure 4. Larger images may be down- 
loaded via anonymous ftp from the shapelets web page at 
http://www.ast.cam.ac.uk/~rjm/shapelets. Notice the 
wide range of galaxy morphologies and behaviours present 
in figure 4. In particular, features resembling spiral arms, 
dust lanes and resolved knots of star formation are present, 
together with various radial profile shapes. By eye, the simu- 
lated galaxies look very similar to those in a similarly-scaled 
section of the HDF itself, reproduced in figure 5. We will 
quantitatively examine whether our simulation effectively 
mimics the morphology distribution of HDF galaxies in the 
following section. 



6 STATISTICAL TESTS AND RESULTS 

We now demonstrate quantitatively that our simulated im- 
ages are realistic, in the sense that commonly used mor- 
phology measures for our galaxies match the distributions of 
those for galaxies in the HDFs. First, we consider the num- 
ber counts and size distributions, using photometry and size 
measures from SExtractor (Bertin & Arnouts 1996). These 
ought to be roughly consistent by construction, because they 
are closely related to two of the axes in our parameter space. 
Then we compare more detailed morphology measures, such 
as concentration (Bershady et al. 2000), asymmetry (Con- 
selice et al. 2000a), dumpiness (Conselice et al. in prepara- 
tion) and ellipticity. These are not automatically expected 
to match, because our shapelet-based PDF does not di- 
rectly represent these quantities. Thus, a comparison be- 
tween these properties for simulated and real data provides 
a rigorous and fair test of how realistic our simulations are. 



6.1 Size and magnitude 

In order to carry out these tests, we first apply the 
SExtractor object-finding and shape measurement pack- 
age on the version 2 reductions of the HDF-N and HDF-S 
(Williams et al. 1996, 1998), together with a 6 arcmin 2 sim- 
ulated image of the same depth. As an intermediate test, we 
also analyse a simulated image containing shapelet recon- 
structions of galaxies drawn from a PDF left as 5-functions. 
These should be identical to the objects in the HDF and act 
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Figure 6. Size us magnitude plane for 6 arcmin 2 /-band images 
to HDF depth, measured with SExtractor. Top-left panel: for 
a simulated image containing shapelet reconstructions of HDF 
galaxies (the PDF kept as (5-functions). Top-right panel: for a 
simulated image with galaxies perturbed in shapelet space. Bot- 
tom panels: for real galaxies in the Hubble Deep Fields North and 
South, calculated using the same SExtractor input parameters as 
reference. The stellar locus is omitted from all panels. 



as a test of the shapelets modelling procedure rather than 
the perturbations in shapelet space. In all four cases, approx- 
imately 320 galaxies brighter than / < 29 were detected per 
arcmin 2 . For the galaxies only, we extracted observed mag- 
nitudes (MAGJ3EST) and sizes (FWHM JMAGE) . 

Figure 6 compares the size vs magnitude distributions 
of the simulated images with those of the two HDFs, ex- 
cluding the stars. Figure 7 then shows the galaxy number 
counts for real and simulated cases in more detail. These 
match well over six or more orders of magnitude, whether 
the simulations used a 5-function PDF or the full version. 
Note, however, that the noise in the simulated images is not 
aliased in the same way as the DRIZZLE algorithm has caused 
the real data to become (see §5.3). The number counts be- 
yond / ~ 29 are highly sensitive to background noise prop- 
erties, and are indeed increased in the simulated image if we 
artificially smooth the noise. Clearly DRIZZLE is something 
that needs further attention in a future implementation. 

For the present purposes, we apply magnitude cuts and 
compare only the brighter objects, which are unaffected by 
such minor changes. These cuts are at levels determined 
by the stability of an individual diagnostic to noise. Fig- 
ure 8 compares the size distribution of the simulated ob- 
jects brighter than / = 29 with those of the HDF galax- 
ies, as found by SExtractor. We find that there is excellent 
agreement in the shape of this distribution: the median and 
standard deviation FWHM for real galaxies in the HDFs 
are 0.30" and 0.24". For simulated objects, these figures are 
0.3l"and 0.23". This agreement comes about partly (but not 
entirely) by construction. It was somewhat expected that 
our simulated images will closely match real data in terms 
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Figure 7. Number counts in simulated /-band images (solid red), 
normalised by area on the sky. Also shown are number counts for 
the Hubble Deep Field North (dot-dashed) and South (dashed). 
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Figure 8. Size distribution of objects in a 6 arcmin 2 simulated 
image with limiting magnitude I = 29 (solid red). Also shown are 
size distributions for the Hubble Deep Field North (dot-dashed) 
and South (dashed). 



of their magnitude and size distributions, but the final high 
precision is encouraging. 

6.2 Galaxy morphology diagnostics 

We can more stringently test the reliability of our algo- 
rithm to reproduce properties of real galaxies by measuring 



Figure 9. Ellipticity distribution, as denned in equations (6) and 
(7), of objects in 6 arcmin 2 simulated image with limiting magni- 
tude I = 29 (solid line). Also shown is the ellipticity distribution 
for the Hubble Deep Fields North (dot-dashed) & South (dashed), 
and a Gaussian with the same mean and rms (dotted). 



morphological parameters which are entirely independent of 
shapelets. We apply a series of commonly used morphology 
diagnostics to two different realisations of the simulated im- 
ages. A first version, containing unaltered shapelet models 
of HDF galaxies, tests the shapelet modelling process in iso- 
lation. A second simulated image, with galaxies drawn from 
the fully smoothed morphology PDF tests the fairness of 
these perturbations in shapelet space. 

A first basic analysis is to determine the gross shape of 
galaxies, i.e. their ellipticities. The ellipticity of all the galax- 
ies was obtained from SExtractor. Following a convention 
in weak lensing literature, we here define two independent 
components of ellipticity as 



ei 



AJMAGE - BJMAGE 
AJMAGE 2 + BJMAGE 2 

AJMAGE 2 - BJMAGE 2 
AJMAGE 2 + BJMAGE 2 



cos(2 x THETA JMAGE) (6) 



sin(2 x THETA JMAGE) (7) 



where AJMAGE and BJMAGE are the lengths of the ma- 
jor and minor axes of the ellipse, and THETAJMAGE is 
the angle between the major axis and the horizontal (all pa- 
rameters supplied by SExtractor). Figure 9 compares this 
ellipticity distribution of the real and fully simulated objects 
brighter than I = 29. Again, these are in excel lent agree- 
ment: with standard deviations in e = ^/e 2 + e 2 of 0.64 for 
real data, 0.62 for simulated data using a 5-function PDF 
and 0.62 for simulated data using the full PDF. 

The four images have also been passed through the 
model-independent morphology software developed by Con- 
selice et al. (2002a), Bershady et al. (2000) and Conselice 
(2003), in order to measure the concentrations (C), asym- 
metries (A) and dumpiness (S) values of the real and simu- 
lated galaxies. We first describe how these three quantities 
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Figure 10. Concentration vs asymmetry, as defined in equations 
(8) and (9), for 6 arcmin 2 images with limiting magnitude / = 26. 
Top-left panel: for a simulated /-band image containing shapelet 
reconstructions of HDF galaxies (the PDF kept as <5- functions). 
Top-right panel: for a simulated image with galaxies perturbed 
in shapelet space. Bottom panels: for real galaxies in the Hubble 
Deep Fields North and South. 
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Figure 11. Concentration vs Petrosian radius, as defined in equa- 
tion (8) and the text, for 6 arcmin 2 square images with limiting 
magnitude / = 26. Panels are ordered as in Figure 10. 
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Figure 12. Asymmetry vs dumpiness, as defined in equations (9) 
and (10), for 6 arcmin 2 square images with limiting magnitude I 
= 25. Panels are ordered as in Figure 10. 



are calculated, and then compare the distributions obtained 
for these measures from real data and simulations. These 
'CAS' parameters are very informative, as all nearby galaxy 
types (ellipticals, spirals, dwarfs, etc.) fall in distinct regions 
of CAS space (Conselice 2003). These parameters thus cap- 
ture most of the variation in galaxy structures and have 
frequently been used for quantitative morphology classifica- 
tion. 

The concentration index, C, is defined in terms of the 
ratio of the radii containing 80% (rgo) and 20% (r2o) of the 
object's total flux: 



C 



x log 

Vr 20 



(8) 



For the total flux, we use the flux within an aperture 1.5 
times the size of the Petrosian radius at r\ — 0.2 (Bershady 
et al. 2000). The r\ parameter is defined as the ratio of the 
surface brightness at a radius divided by the surface bright- 
ness integrated within the radius, such that at the centre of 
a galaxy, ij = 1 and at the very edge of a galaxy (where its 
surface brightness is 0), r) = 0. 

Typical values of C for real galaxies range from ap- 
proximately 2 to 6. Galaxies with C > 4 are usually ellip- 
ticals or spheroidal systems: a galaxy with an r 1 / 4 profile 
has C = 5.2. A purely exponential disc galaxy has C = 2.7 
(Bershady et al. 2000). Objects with lower light concentra- 
tions are shown by Graham et al. (2001) to be systems with 
low central surface brightnesses and often low internal ve- 
locity dispersions. Low concentration values are also found 
for dwarf galaxies (e.g. Conselice et al. 2002). The concen- 
tration index thus correlates, within some scatter, with the 
total mass of a galaxy. 

The asymmetry index used in this paper (called Aiso in 
Conselice et al. 2000a,b) is calculated by rotating an image 
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by 180° and subtracting the it from the original. Then we 
evaluate 

J2 \Ix, v - ll S y |] m . [I] l-Bi.y - B^ 8 °| 



, (9) 



where I x , y is the galaxy surface brightness in the (x, y) pixel 
of the image, B x ,y the sky background in the same pixel, 
and superscripts denote rotations. Sums are over all pixels 
within the same rj — 0.2 Petrosian radius from which the 
total light measurement is made. Minimisation is then over 
different choices of the centre of rotation (see Conselice et 
al. 2000a). 

The asymmetry index is sensitive to any physi- 
cal processes in a galaxy that produce asymmetries in 
light distributions, such as star-formation, galaxy interac- 
tions/mergers, and projection effects such as dust lanes. 
There is a general correlation between the asymmetry value 
and the (B — V) colour (Conselice et al. 2000a). Since most 
galaxies are not edge-on systems, star formation and galaxy 
interactions/mergers are the dominant effects that produce 
asymmetries in real galaxies. These two effects can often be 
distinguished, however. Systems with asymmetries A > 0.35 
are generally created by interactions or mergers (Conselice 
2003, Conselice et al. 2003). However, other merger events 
can have more modest asymmetry values. From this and 
more detailed studies of the asymmetry index, it has been 
concluded that A is most sensitive to bulk structures in 
galaxies (Conselice 2003). 

The dumpiness parameter, S, is a measure of the high- 
spatial frequency component of galaxies. It is calculated by 
smoothing a galaxy's image with a smoothing length a, then 
subtracting this smoothed version I xy from the original im- 
age. This leaves a residual map containing only those fea- 
tures with a high-spatial frequency. Summation is again per- 
formed over pixels within the rj = 0.2 Petrosian radius, al- 
though those from the central cusp are ignored. Also includ- 
ing a correction for the background B Xjy , the dumpiness is 
then defined as 

y 



(10) 



The dumpiness index is sensitive to the instantaneous rate 
of star formation, and correlates very well with Ha equiva- 
lent widths; it also correlates to a lesser degree with broad- 
band colors (Conselice 2003). Other details of its calculation 
and properties are discussed in detail in Conselice (2003). 

We also use the Petrosian radius R (Petrosian 1976) 
to characterise the galaxies, defined as the position where 
q = 0.2. The Petrosian radius is found to be a better index 
than the SExtractor FWHM radius for determining mor- 
phological sizes, as SExtractor radii are based on isophotal 
thresholds which will represent different physical distances 
from the galactic centre depending on the distance to the 
galaxy. Because r\ is a ratio of surface brightnesses in a 
given galaxy, the run of rj with r in a galaxy is immune 
to many such types of systematic effects (Sandage & Perl- 
mutter 1990) and Petrosian radii are found to be a stable 
tool for deriving morphological parameters independent of 
distance (Bershady et al. 2000). 

We are now in a position to compare the measurements 
for C, A, S and R for real and simulated images. Projec- 
tions from this morphological parameter space for real and 
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Table 1. Galaxy morphology statistics. The first two columns 
show results for real objects, taken from the Hubble Deep Fields. 
Compare this with objects in simulations created using a <5- 
function PDF or the full shapelet-morphing procedure. 



simulated data are displayed in figures 10-12, and relevant 
statistics are compiled in table 1. 

As can be seen from the scatter in the plots, the agree- 
ment between simulations and real data is rather good: we 
are very pleased by the encouraging results. The matching 
distributions of the concentration parameter puts to rest 
one criticism frequently levelled at shapelets (see §3), that a 
truncated Gaussian-Laguerre expansion may not stretch far 
enough spatially to capture the extended wings of typical 
astronomical objects. Clearly our algorithm sets n max high 
enough to avoid this problem while still modelling the HDF 
galaxies using only a few coefficients. 

The final population of simulated galaxies does con- 
tain asymmetry values lower than those in the real data, 
although the distributions agree within la. This slight dis- 
crepancy is neither due to deficiencies in the shapelet mod- 
elling procedure, nor to the increased clustering of galaxies 
at short separations in real data, because it is absent from 
the simulation created with a 5-function PDF. Decreased 
object asymmetry must therefore be a by-product of the 
PDF smoothing. There is no obvious a priori reason why 
this should happen. Even m states are symmetric and odd 
m states anti-symmetric, so if the absolute values of all coef- 
ficients are randomly changed by the same amount, the over- 
all symmetry of the object should stay constant. However, 
our nearest-neighbour prescription from §5.1 results in an 
average smoothing length across typical even m states, and 
particularly the m = states, of approximately twice that 
for odd m states. This may simply be because the first state 
is even, and the smoothing length tends to get shorter as n 
increases. A more sophisticated adaptive smoothing method 
might be found to prevent this effect, but we have not pur- 
sued that idea here. We note the asymmetry discrepancy, 
but note also that it is relatively small. 

The behaviour of the dumpiness parameter is also rea- 
sonable. Truncation in shapelet modelling smooths galax- 
ies slightly, and thus removes the tail of objects with very 
high S. Morphing in shapelet space apparently acts to then 
smooth some of the galaxies further. This is peculiar be- 
cause, if anything, the galaxies in figure 3 appear by eye to 
become more clumpy as the smoothing length is increased. 
Overall, the agreement of the simulated distributions with 
real data is remarkably consistent with the field-to-field vari- 
ation between the two HDFs. Indeed, dumpiness is a rather 
unstable statistic to measure. For example, even the slight 
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rise in mean dumpiness for the (^-function simulation might 
be significant: especially since it is apparent despite the miss- 
ing tail at high S. It is possible that the increase is caused by 
residual artefacts in the shapelet models, but more plausibly 
because the noise in our simulated images is not correlated 
between adjacent pixels. The HDFs themselves have been 
DRIZZLEd in order to achieve their high resolution, a process 
which also aliases the image. As a simple approximation to 
this effect, we have tried smoothing the noise slightly in our 
simulations, by a top hat kernel 3 pixels wide. This pro- 
cess does indeed remove the slight disparity observed in the 
simulated dumpiness distribution, but simultaneously cre- 
ates many false detections of faint, circular objects from the 
noise at the magnitude limit around I > 29. 

Therefore we conclude that our shapelet simulations ob- 
tain similar morphology distributions to those found in real 
data. This is most encouraging as these were not arranged 
by construction, and the level of realism seen here is a strong 
vindication of the shapelet modelling of galaxies. Perturbing 
shapelet parameters to create new galaxies can introduce 
a few minor deviations, but these are small compared to 
natural variation between objects, and are well understood 
and quantified. We can therefore use shapelets as a tool for 
investigating galaxy morphology and for creating realistic 
simulated images. 



7 COMPARISON TO OTHER METHODS 

There have been many packages in the literature which sim- 
ulate astronomical observations, including Skymaker (see Er- 
ben et al. 2001) and artdata in IRAF (Tody 1993). These 
typically parametrize galaxy shapes using simple physical 
models such as ellipses with de Vaucouleurs or exponential 
profiles. The smooth variation allowed for these parameters 
enables them to generate an unlimited number of unique 
simulated galaxies. These methods are particularly valuable 
for simulating images from ground-based telescopes. Unfor- 
tunately, deep images from HST contain galaxies with re- 
solved features more complex than these analytical models, 
so such simulations are useful in only a limited regime. 

This was realised by Bouwens, Broadhurst & Silk 
(1998), who designed simulations to investigate the evolu- 
tion of galaxy morphology in the HDF. Indeed, their work 
succeeds in ruling out pure luminosity evolution of galax- 
ies: which precisely demonstrates the need for deep image 
simulations to contain more irregular and asymmetric mor- 
phologies. Their method repeatedly places the few brightest 
HDF galaxies onto a simulated image, and is similar to that 
which ours would have been, had we left the PDF as an (un- 
smoothed) sum of 5-functions. Some physics can be added 
to rescale and redshift these few sources, but it remains a 
very small population from which to simulate a large imag- 
ing survey, and containing members drawn exclusively from 
the local universe. Creating realistic images was not the in- 
tention of Bouwens, Broadhurst & Silk (1998) and, for our 
objectives, their method would require the addition of more 
physics {e.g. galaxy evolution, star formation histories, red- 
shift distributions, etc.). 

Our technique attempts to capture the best aspects of 
both methods, by defining a smooth parameter space that 
can yield an unlimited number of unique galaxies, but also 



contains a rich diversity of their morphologies (potentially 
any morphology, in fact, since the set of shapelet basis func- 
tions is complete). Since the parameter space is populated 
via statistical rather than physical arguments, it is the many 
tests to which we have subjected our simulated images that 
demonstrate the validity of our method. We find a regime 
spanning six orders of magnitude in luminosity where our 
simulations are valid, and their statistical properties match 
those of real data. This ability to produce simulated images 
containing galaxies with realistic morphologies is a signifi- 
cant advance. 



8 CONCLUSIONS 

We have presented a method for generating simulated deep 
sky images of an arbitrarily large survey area, as might be 
observed with extended observations with the Hubble Space 
Telescope. These simulated images are populated with all 
morphological types of galaxies, based upon those in the 
Hubble Deep Fields. 

The simulated galaxies are drawn from a multi-variate 
distribution of realistic morphologies, described using the 
shapelet formalism (Refregier 2003). In order to generate 
this morphology distribution, we decompose all HDF galax- 
ies into shapelet components using least-squares fitting. We 
optimise this decomposition by finding the scale length j3 
and number of modes n max which produces a best shapelet 
coefficient fit to each galaxy. The resulting coefficients of 
HDF galaxies form a cloud of points in shapelet space; these 
points are replaced by smooth kernels in order to recover the 
underlying probability distribution of real galaxy morpholo- 
gies. The smooth distribution is then resampled, using an 
unbiased Monte-Carlo technique, to obtain new galaxies. 

We place these simulated galaxies onto HDF-sized im- 
ages, simultaneously including effects such as PSF, pixelli- 
sation, photon shot noise and Gaussian background noise. 
The level of detail in the resulting simulated galaxies in- 
cludes features such as realistic radial profiles, spiral arms, 
dust lanes and resolved knots of star formation. 

We have noted that the global morphological properties 
of the simulated galaxy population must match those of real 
galaxies if our simulations are to be useful. We have demon- 
strated that this is the case by comparing various morphol- 
ogy diagnostics in simulated and real galaxies, including 
number counts as a function of magnitude, the size dis- 
tribution, ellipticity distribution, and concentration, asym- 
metry and dumpiness indices. A test involving purely the 
shapelet decomposition and reproduction of the HDF galax- 
ies preserves all of these statistics with high precision, and 
we conclude that a shapelet decomposition can successfully 
capture the morphological properties of all galaxy types. A 
few slight discrepancies are introduced to the statistics by 
perturbing their shapelet coefficients (or smoothing the mor- 
phology distribution) to manufacture genuinely new galaxy 
shapes. However, these differences are small compared to 
even the natural variations between objects. Several minor 
effects have been well quantified by our various tests, and 
their causes understood for correction in future implemen- 
tations. 

An important application for our simulated images is 
presented in Massey et al. (2003), where they are used to pre- 
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diet the sensitivity to weak gravitational lensing of the pro- 
posed SNAP satellite. However, the simulations presented 
here are in no way specific to gravitational lensing, and may 
be used for testing image analysis in various branches of as- 
tronomy. Further simulated images and catalogues are avail- 
able from the authors. 

A useful extension to this work will be to include 'Groth 
survey strip' (Groth et al. 1994) galaxies and ACS data when 
constructing the morphology probability distribution. This 
will provide future simulations with a more extensive sample 
of large, bright galaxies, improving the fidelity of the simu- 
lations in this region of parameter space. A method is also in 
development to generate multi-colour simulated images us- 
ing several HDF passbands and photometric redshifts. A by- 
product of this work is a complete morphological catalogue 
of all the HDF galaxies in shapelet space. This catalogue will 
be used in a future paper on the automated morphological 
classification of galaxies at high redshift. 
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